1. Read the dataset.

##   ID Duplicate     Study Ancestry Treated_micro Treated_survey Perceived_color
## 1  1         0 Cambridge        1             0              0        1. Black
## 2  2         0 Cambridge        1             0              0        1. Black
## 3  3         0 Cambridge        1             0              0        1. Black
## 4  4         0 Cambridge        1             0              0        1. Black
## 5  5         0 Cambridge        6             0              0        1. Black
## 6  6         0 Cambridge        1             0              0        1. Black
##    A500  A650 PTCA TTCA H_4AHP A650_A500 TTCA_PTCA H_4AHP_PTCA PTCA_A500
## 1 0.310 0.104  346   NA    9.0 0.3354839        NA  0.02601156  1116.129
## 2 0.191 0.059  194   NA    7.5 0.3089005        NA  0.03865979  1015.707
## 3 0.255 0.082  280   NA    9.7 0.3215686        NA  0.03464286  1098.039
## 4 0.262 0.085  333   NA    7.5 0.3244275        NA  0.02252252  1270.992
## 5 0.112 0.035  141   NA    4.5 0.3125000        NA  0.03191489  1258.929
## 6 0.209 0.067  250   NA    6.7 0.3205742        NA  0.02680000  1196.172

2.Review the dataset.

For column I to P:

  • Only continuous quantitative variables.
  • Represent the various proxy measures of melanins,
  • I would like to make them into the response variable.
  • There are five different type of the chemical elements which includes H to L, and from M to P, they are the combination of elements from H to L. Thus, I decide to find the relationship for the the variables from H to L first.

For column A:

  • Discrete quantitative variables
  • Represent different tested sample.
  • But they are disordered which seems to be shown in random ways.
  • I guess they are just some examples which are picked through a large sample size.

For column D:

  • Discrete quantitative variables
  • Represent the number of ancestories each sample has.
  • But it is surprised to see it include zero value in the column.
  • I guess maybe there is no record in the data.

For column B, E, F:

  • Categorical variables.
  • Only include 0 and 1 which represent yes or no
  • They are Duplicate, Treated_mirco, Treated_survey

For column C:

  • Categorical variable which represent type of education samples had.
## [1] "Cambridge" "ADAPT"     "FEMMES"
  • We can see that there are three types included in this variables.

For column G:

  • Categorical variable
  • Represent the race of each samples.
## [1] "1. Black"        "2. Dark Brown"   "3. Medium Brown" "4. Light Brown" 
## [5] "5. Blond"        "6. Red"          "7. Grey"
  • We can see that there are seven types included in this variables.

For column K and N:

  • They are quantitative variables.
  • However, it includes many values of NA.
  • The other values in these two columns are much larger than zero.
  • That’s I plan to either giving up these two columns or using the average of the other value instead.
  • Then I am going to check for the variance of the other value to see whether we could use the average instead.
## [1] 594.8536
## [1] 1.08309
  • We get a variance of 594.8536 for TTCA and 1.08309 for TTCA_PTCA.
  • I think it is not fine to use the values.

3. explore the data.

I. First, I would like to see the chemical elements response to the

people of different race.

show that people in Black seems to have outstanding highest A500, A650,

PTCA. However, the difference is not very big especially for the first

two graphs since the scales are small. However, people in red tend to

have more H_4AHP. But all the others has pretty small value of H_4AHP.

The difference seems to be big.

II. Next, I would like to check for the influence on ancestry.

Firstly, I see some zero value in the column which I would like to remove

since they’re meaningless.

Then I make plots to see the influence.

I don’t see a outstanding pattern relationship between chemicals and

ancestry. Thus, I would like to put them into boxplot.

From the graph of A500, A650, PTCA, we can see that they show that most

dots fall between 1 and 2 since the range of them are the largest. And

for the ancestry of 1,3,6, they tend to have bigger value of chemicals.

However, for the graph of H_4AHP, it is salient to see that the mode of

the data falls on 2. It is also happy to see that the highest value and

medium also falls onto it.

III. let’s further explore whether there are some relationship between

chemicals.

We can see that those three graphs display a clear positive linear

relationship. Thus, I expect to see a simple linear regression model for

it. Thus, I put them into the format.

## 
## Call:
## lm(formula = A500 ~ A650, data = x)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.016458 -0.005606 -0.001925  0.003501  0.032437 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.014820   0.001362   10.88   <2e-16 ***
## A650        2.914876   0.024791  117.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.008984 on 131 degrees of freedom
## Multiple R-squared:  0.9906, Adjusted R-squared:  0.9905 
## F-statistic: 1.382e+04 on 1 and 131 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = A500 ~ PTCA, data = x)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.073845 -0.014979 -0.001507  0.015623  0.106760 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.078e-02  3.831e-03   2.814  0.00565 ** 
## PTCA        7.943e-04  1.879e-05  42.266  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02424 on 131 degrees of freedom
## Multiple R-squared:  0.9317, Adjusted R-squared:  0.9312 
## F-statistic:  1786 on 1 and 131 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = A650 ~ PTCA, data = x)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.023793 -0.004327  0.000182  0.003472  0.039547 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.388e-03  1.220e-03  -1.138    0.257    
## PTCA         2.725e-04  5.983e-06  45.543   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.007717 on 131 degrees of freedom
## Multiple R-squared:  0.9406, Adjusted R-squared:  0.9401 
## F-statistic:  2074 on 1 and 131 DF,  p-value: < 2.2e-16

From the result, we see that the p-value for the first and second model

are very small and the adjusted R-squared are higher than 0.9. However,

for the third model, we can see that the p-value for intercept is big,

even though its adjusted R squared is large. Thus, it definitely needs

further check.

Let’s see the confidence interval.

##                  2.5 %     97.5 %
## (Intercept) 0.01212621 0.01751386
## A650        2.86583262 2.96391879
##                    2.5 %       97.5 %
## (Intercept) 0.0032014187 0.0183571045
## PTCA        0.0007571028 0.0008314545
##                     2.5 %       97.5 %
## (Intercept) -0.0038011631 0.0010243396
## PTCA         0.0002606674 0.0002843407

And from the confidence interval of three models, we don’t see the value

of zero for model 1 and 2 which also indicates that there is a

relationship. However, for the model 3, the result is not very ideal.


Let’s further check for the correlation test.

## 
##  Pearson's product-moment correlation
## 
## data:  x$A650 and x$A500
## t = 117.58, df = 131, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9933715 0.9966618
## sample estimates:
##       cor 
## 0.9952954
## 
##  Pearson's product-moment correlation
## 
## data:  x$PTCA and x$A500
## t = 42.266, df = 131, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9513183 0.9752239
## sample estimates:
##       cor 
## 0.9652351
## 
##  Pearson's product-moment correlation
## 
## data:  x$PTCA and x$A650
## t = 45.543, df = 131, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9577306 0.9785220
## sample estimates:
##       cor 
## 0.9698426

However, from the result, we see that all three of these p-value for the

test are very small. And the result of the correlation is big. Thay are

all over 0.9 which indicates a strong linear relationship.

Let’s further test whether slope is significant.

## Analysis of Variance Table
## 
## Response: A500
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## A650        1 1.11570 1.11570   13824 < 2.2e-16 ***
## Residuals 131 0.01057 0.00008                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Analysis of Variance Table
## 
## Response: A500
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## PTCA        1 1.04932 1.04932  1786.4 < 2.2e-16 ***
## Residuals 131 0.07695 0.00059                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Analysis of Variance Table
## 
## Response: A650
##            Df   Sum Sq Mean Sq F value    Pr(>F)    
## PTCA        1 0.123512 0.12351  2074.2 < 2.2e-16 ***
## Residuals 131 0.007801 0.00006                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

And the results show that p-value are very small for these three model.

The slopes seem to be significant.

Lastly, let’s us check for the normal assumption of residual.

From the graph, we can see that the it has constant residual value, but a

big shape in normal QQ plot for first model. However, it seems to meet the

requirement of normal for second and third graph.

Let’s see the outliers.

##          135          124           89           77            9          116 
## -1.844912277 -1.822951089 -1.653918283 -1.585726276 -1.421650077 -1.353602868 
##           75           94           70           50           71           92 
## -1.259928575 -1.225690513 -1.196502352 -1.109573500 -1.065230889 -1.060227079 
##           83          121           24           20          134          126 
## -1.011479228 -0.997056037 -0.973809214 -0.973768404 -0.967662993 -0.943770511 
##           16            1           90           54           72            8 
## -0.936739657 -0.902309500 -0.880118444 -0.857264151 -0.819057254 -0.806161841 
##           93           79           68           74           43          125 
## -0.797196338 -0.774213408 -0.764635675 -0.761948914 -0.733472800 -0.718913725 
##           37           27           52           34          127           46 
## -0.676647293 -0.646581580 -0.640713586 -0.628263714 -0.626025547 -0.593279500 
##           44           76            5           36           53          123 
## -0.555394742 -0.547982431 -0.541080399 -0.538477076 -0.536468431 -0.528738830 
##          114           58           60           91           35           86 
## -0.509396182 -0.505996783 -0.496392490 -0.486685491 -0.481518121 -0.480445010 
##           40           49          120           69          105           29 
## -0.450179192 -0.421868082 -0.397272703 -0.388742495 -0.358784040 -0.358358279 
##          112           95           84           48           57          108 
## -0.339580491 -0.333290797 -0.329988111 -0.320475417 -0.311002345 -0.285149223 
##          106           33           56           31           80           18 
## -0.275541881 -0.262101482 -0.254127211 -0.246435411 -0.234354840 -0.216159265 
##           45           14           63           32           21           98 
## -0.216159265 -0.168616613 -0.165424890 -0.153520105 -0.141952811 -0.137547548 
##            6           10            4           64           38           82 
## -0.125005830 -0.125005830 -0.065708983 -0.063565654 -0.048832740 -0.041794641 
##          109           22           19           30           12          117 
##  0.003503465  0.035360020  0.037134281  0.039227175  0.065897818  0.068085028 
##          111           47            3           23          115           99 
##  0.070278533  0.097608787  0.130313517  0.137093768  0.146382449  0.163344694 
##          107          119          100          110           96           39 
##  0.172849661  0.190025428  0.209036845  0.247015608  0.275468173  0.290890430 
##           81          118           15           55           73           42 
##  0.332318245  0.340297726  0.380371905  0.393683672  0.396341749  0.450765050 
##           25            2           17           97            7           28 
##  0.463538915  0.469891557  0.520635989  0.537427105  0.604513154  0.652980633 
##          128          113           67          122          102           51 
##  0.733316062  0.817389037  0.836631528  0.845779892  0.892310810  0.907550475 
##           13           61           66           78           41          104 
##  0.940725027  1.156636909  1.204139989  1.256125829  1.458267161  1.593482510 
##          130          129          103          132           62           65 
##  1.642082533  1.723935989  1.751376147  1.763539507  1.777997195  1.860473027 
##          101          131           85          133           26           59 
##  1.921046641  1.988327233  2.114915979  2.127563333  3.279715404  3.324793379 
##           11 
##  3.662443365
##           58           52           30           90           76           46 
## -3.112080075 -2.206717007 -1.899460644 -1.889051385 -1.885828497 -1.782682303 
##           28           92           57           74           35           54 
## -1.538627223 -1.285520291 -1.284652057 -1.284020249 -1.213103281 -1.144329965 
##           38           91           50           45           56          116 
## -1.142006352 -1.084778865 -1.073700148 -1.063732958 -1.032595565 -1.025983382 
##           70           49          115           26           75           15 
## -1.008517942 -0.963252533 -0.905843027 -0.861397321 -0.855853364 -0.851337392 
##           31           36           43          124           48           93 
## -0.840574229 -0.838956778 -0.812325798 -0.742799488 -0.714027783 -0.707865313 
##           32           83           55           94           53           68 
## -0.684898598 -0.656490964 -0.627463247 -0.621580273 -0.607103372 -0.594925127 
##            4           72           37           89           51          106 
## -0.554222190 -0.548642878 -0.527457785 -0.522120642 -0.522013329 -0.494459170 
##           33            5           82          114          102           44 
## -0.494450824 -0.446280470 -0.431842668 -0.402154993 -0.398856818 -0.394604169 
##          126           73          110          112           29            9 
## -0.391759683 -0.337929735 -0.260946771 -0.255954216 -0.254932977 -0.254371623 
##           24           71           69           62           98          108 
## -0.251573794 -0.234410607 -0.201705084 -0.199256883 -0.194016401 -0.193680846 
##           81           40           95           86           34          118 
## -0.158531129 -0.144608656 -0.143880151 -0.140366776 -0.068366847 -0.062723285 
##          134           47          119            6          125          128 
## -0.062723285 -0.016769751 -0.015389325 -0.014479099 -0.003779126  0.019066777 
##           80          121          117           84          105           10 
##  0.029080374  0.045362169  0.056264674  0.059794054  0.062109640  0.084387896 
##           67          111          127          107          109          122 
##  0.085235978  0.157268116  0.165577104  0.196334945  0.199299221  0.229667946 
##          100           12           97           25          129           16 
##  0.254485408  0.268243100  0.278924534  0.335871014  0.347173226  0.375823983 
##           27          120          135           96           99           79 
##  0.377965685  0.448782464  0.462833073  0.476773746  0.481483332  0.498754252 
##           59           19           42          113          104          123 
##  0.548810458  0.587378128  0.636536460  0.650284022  0.651224522  0.717166601 
##          132          101           17           14          130          131 
##  0.740388894  0.796906584  0.804054099  0.819826286  0.869041923  0.873545652 
##           85            7            3           21            1           13 
##  0.882628641  0.882685053  0.907129748  0.991456027  1.020150980  1.040105165 
##           61            2           39          133           77           78 
##  1.050767632  1.082428537  1.111093503  1.194122113  1.198144094  1.233885393 
##            8          103           18           66           60           23 
##  1.234275208  1.247358082  1.250640421  1.259253043  1.287951849  1.356165158 
##           65           22           41           63           64           11 
##  1.379073108  1.609781849  1.650247748  1.792496946  1.962663605  2.745536387 
##           20 
##  4.445738776
##          58          26          52          30          28          76 
## -3.14931902 -2.24048971 -2.12155445 -2.06245930 -1.92028344 -1.81299185 
##          90          46          57          38          35          74 
## -1.68440123 -1.68363033 -1.25987843 -1.21096649 -1.11472246 -1.07913604 
##          15          45         115          56          91          92 
## -1.06980126 -1.05983734 -1.03437220 -1.01109742 -0.97454080 -0.96211754 
##          62          51          54          49          55          31 
## -0.92613257 -0.92541763 -0.89069283 -0.86923229 -0.83403851 -0.80731836 
##         102          59          50          36          32          48 
## -0.78549662 -0.73657135 -0.71386135 -0.68890309 -0.67674023 -0.64138100 
##          70          43           4         116          73          82 
## -0.60897536 -0.58230574 -0.57114378 -0.56515341 -0.52266106 -0.44847379 
##          93          53          33         106          75         110 
## -0.44434992 -0.43988691 -0.42819080 -0.42262622 -0.41904343 -0.37959341 
##          68         129          81          83          37         128 
## -0.33563815 -0.31403732 -0.30337118 -0.30337118 -0.29809193 -0.27202153 
##           5          72          67         114          44         118 
## -0.26473680 -0.26403300 -0.24214677 -0.22973765 -0.20339807 -0.20327497 
##          94          98         112          29         108         119 
## -0.18014204 -0.15389184 -0.14003158 -0.13165326 -0.09461969 -0.09225563 
##         122         124          69          47         126          95 
## -0.09015149 -0.07192120 -0.06210720 -0.05744389 -0.04483349 -0.02166985 
##          40         117           6          86         104          97 
##  0.02381844  0.03366341  0.03419808  0.04085643  0.06520576  0.08598825 
##         101         132          89          85          24          80 
##  0.09124434  0.09322785  0.09827647  0.10622111  0.11770089  0.12503214 
##          10         111         107         131          71          25 
##  0.14072406  0.14154960  0.14266348  0.14676469  0.17310953  0.17664193 
##          34         100          84         105         109          12 
##  0.17707122  0.19088469  0.19642704  0.21046074  0.21356481  0.26247399 
##         130         125           9         134         113          96 
##  0.28006575  0.28363918  0.29352593  0.31944032  0.37408546  0.40378625 
##         127         133         121          99          42          19 
##  0.42911927  0.43600385  0.44783816  0.45368361  0.50583059  0.61787082 
##         120         103          17          27          61           7 
##  0.64268250  0.64372563  0.65841948  0.66500385  0.67026887  0.70973457 
##          65          13          16          78          79          66 
##  0.74328760  0.74521796  0.77873741  0.82728833  0.84696913  0.87639108 
##           3          14           2         123          39          21 
##  0.92527950  0.95037173  0.97870302  0.98467743  1.08105699  1.12474084 
##          41         135          23          18           1          11 
##  1.19670350  1.23787257  1.40649814  1.43352370  1.45781825  1.50072784 
##          60           8          22          77          63          64 
##  1.58494613  1.65056323  1.72011257  1.92616535  1.99675146  2.13978787 
##          20 
##  5.17230102

There are 5 points of outliers for first model and 2 points of outliers

for second and third model which over standard deviation of 2.

In order to make a nicer model, I would like to make the transformation.

## 
## Call:
## lm(formula = A500^(0.8) ~ A650, data = x)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.034677 -0.006795 -0.000521  0.005849  0.041903 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.052589   0.001859   28.28   <2e-16 ***
## A650        3.437719   0.033853  101.55   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01227 on 131 degrees of freedom
## Multiple R-squared:  0.9875, Adjusted R-squared:  0.9874 
## F-statistic: 1.031e+04 on 1 and 131 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = sqrt(A500) ~ PTCA, data = x)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.108241 -0.018086 -0.000211  0.018876  0.092945 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.762e-01  4.858e-03   36.27   <2e-16 ***
## PTCA        1.086e-03  2.383e-05   45.55   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03074 on 131 degrees of freedom
## Multiple R-squared:  0.9406, Adjusted R-squared:  0.9402 
## F-statistic:  2075 on 1 and 131 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = (A650)^(0.6) ~ sqrt(PTCA), data = x)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.034565 -0.009414 -0.000880  0.010318  0.062456 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.0260570  0.0037224   -7.00  1.2e-10 ***
## sqrt(PTCA)   0.0140647  0.0002851   49.33  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01549 on 131 degrees of freedom
## Multiple R-squared:  0.9489, Adjusted R-squared:  0.9485 
## F-statistic:  2433 on 1 and 131 DF,  p-value: < 2.2e-16

Now, the graph look much better, especially for the requirements of normal

assumption for the model 1. Moreover, the p-value for the model 3 is much

more smaller than before, even though the residual is not very ideal as I

thought.

Let’s see the other variables.

## 
## Call:
## lm(formula = A500 ~ H_4AHP, data = x)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.134575 -0.083009 -0.007986  0.075965  0.211277 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.1564502  0.0085648  18.267  < 2e-16 ***
## H_4AHP      -0.0004306  0.0001484  -2.902  0.00435 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08988 on 131 degrees of freedom
## Multiple R-squared:  0.0604, Adjusted R-squared:  0.05322 
## F-statistic: 8.421 on 1 and 131 DF,  p-value: 0.004354

## 
## Call:
## lm(formula = A650 ~ H_4AHP, data = x)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.044536 -0.027118 -0.001468  0.024998  0.074561 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.903e-02  2.898e-03  16.919   <2e-16 ***
## H_4AHP      -1.663e-04  5.021e-05  -3.312   0.0012 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03041 on 131 degrees of freedom
## Multiple R-squared:  0.07725,    Adjusted R-squared:  0.0702 
## F-statistic: 10.97 on 1 and 131 DF,  p-value: 0.001199

## 
## Call:
## lm(formula = PTCA ~ H_4AHP, data = x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -175.566  -99.990   -0.532   85.921  237.367 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 185.2349    10.2741  18.029  < 2e-16 ***
## H_4AHP       -0.6187     0.1780  -3.476  0.00069 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 107.8 on 131 degrees of freedom
## Multiple R-squared:  0.08445,    Adjusted R-squared:  0.07746 
## F-statistic: 12.08 on 1 and 131 DF,  p-value: 0.0006905

Those three graph’ shapes are very interesting. They show a shape which

significantly skews to the right. From the summary table, we can also see

that their variables do not represent an ideal relationship between

each other since their adjusted R square is very small.

Conclusion

In conclusion, I first explore the relationship between of races and chemicals. From the

graph, I find that people in Black have outstanding highest A500, A650,

PTCA. People in red tend to have more H_4AHP and their difference of scales

also seems to be big. Then I explore the relationship between ancestry

and chemicals. From the data, I discover that most data of A500, A650,

PTCA include 1 or 2 ancestry. And for the ancestry of 1,3,6, they tend to

have bigger value of chemicals. However, for the graph of H_4AHP, most

data includes the 2 ancestry. It is also happy to see that the highest

value and medium also falls onto it. Lastly, I further explore the

relationship between chemicals to see whether they influence each other.

It is clear to see that A500, A650 and PTCA have a positive linear

relationship to influence each other. We can also do the prediction by

the following models.

A500^0.8 = A650 * 3.437719 + 0.052589

sqrt(A500) = PTCA * 1.086e-03 + 1.762e-01

A650^0.6 = sqrt(PTCA) * 0.0140647 - 0.0260570